Spatial Data Science with R

GEO 503

Adam M. Wilson

Assistant Professor of Global Environmental Change
Geography Department

I Use R:

  • GIS (with a little GRASS)
  • Statistics
  • Graphs
  • HTML/Websites (including this one!)

Today's plan

  1. Course website (UBLearns) and syllabus
  2. What is R?
  3. Who uses it?
  4. Reproducible Research
  5. Guided interactive coding

Course Structure

Mondays 5-7:40 (2 hours 40 min)

  • Review/Questions
  • 30 Minute Presentation
  • Semi-guided interactive exercises on your laptops

3 Learning Objectives

  1. Become familiar with R programming language
  2. Learn to code geospatial analyses in R
  3. Learn to develop reproducible research workflows

This course is NOT

A statistics course (see GEO 505, etc.). We will focus on workflow and methods (‘how’ not ‘why’)

–Grolemund & Wickham, R for Data Science, O'Reilly 2016

Why program?

Add ArcGIS screenshot

Graphical User Interfaces are useful,

A man who does not know foreign language is ignorant of his own.
– Johann Wolfgang von Goethe (1749 - 1832)

Learning a programming language can help you learn how to think logically.

The computer is incredibly fast, accurate, and stupid. Man is unbelievably slow, inaccurate, and brilliant. The marriage of the two is a force beyond calculation.
– Leo Cherne

Programming gives you access to more computer power.

UBLearns.buffalo.edu

Class Survey

Software ArcGIS 94% Python 29% R 29% SPSS 29% Google Earth Engine 24% Erdas Imagine 24%

Scripting

Yes 71% No 29%

Used R?

No 52%

From Graphical User Interface (GUI) to Scripting

R Project for Statistical Computing

  • Free and Open source
  • Data manipulation
  • Data analysis tools
  • Great graphics
  • Programming language
  • 6,000+ free, community-contributed packages
  • A supportive and increasing user community

R is a dialect of the S language developed at Bell Laboratories (formerly AT&T) by John Chambers et. al. (same group developed C and UNIX©)

What is the R environment?

  • effective data handling and storage facility
  • suite of operators for (vectorized) calculations
  • large, coherent, integrated collection of tools for data analysis
  • graphical capabilities (screen or hardcopy)
  • well-developed, simple, and effective programming language which includes:
    • conditionals
    • loops
    • user defined functions
    • input and output facilities

Reproducible, Portable, & Transparent

R Graphics

Spatial data in R

Packages: sp, maptools, rgeos, raster, ggmap Examples: species range overlays

source

Basemaps with ggmap

Parallel Processing

For BIG jobs: multi-core processors / high performance computing with foreach.

Strengths & Limitations

  • Just-in-time compilation
    • Slower than compiled languages
    • Faster to compose
  • Many available packages
  • Most operations conducted in RAM
    • RAM can be limiting and/or expensive
    • Error: cannot allocate vector of size X Mb
    • Various packages and clever programming can overcome this…
  • Free like beer AND speech!

R Interface

But there are other options…

R in Mac

R in Windows

Who uses R?

“Analytics” Jobs on indeed.com

Feb 2014: http://r4stats.com/articles/popularity/

Scholarly articles by software package

http://r4stats.com/articles/popularity/

Change in scholarly articles

http://r4stats.com/articles/popularity/

Number of scholarly articles found in the most recent complete year (2014) for each software package used as a topic or tool of analysis. For methods see here.

Forum/discussion activity

http://r4stats.com/articles/popularity/

Rexer Analytics Data Miner Survey (2013)

~1.2k respondents http://r4stats.com/articles/popularity/

240 Books on R since 2000

R Development

Over 6k packages!

Task Views organize packages by topic

http://cran.r-project.org/web/views/

Reproducible Research

  • The ability to reproduce results from an experiment or analysis conducted by another*
  • Developed from literate programming:
  • Logic of the analysis is represented in output
  • Combines computer code with narrative

Reproducible Research

Typical GUI Workflow Organized and repeatable workflow (and some example commands)

Following Along

RStudio

Following Along

Assessment / Grading

  • Course Participation (10%) Actively participation
  • Package Presentation (10%) Overview of a R package of your choice
  • Homeworks (30%)
  • Final Project (50%) a dynamically generated set of figures (like an infographic) of a spatial analysis related to each student’s interest. Report will be uploaded to UBlearns as a PDF file with RMarkdown source code. This project can be related to the student’s own research or a separate topic.

Homework

Working collaboratively is encouraged but you are responsible for developing your own code to answer the questions.

  • Acceptable: “which functions did you use to answer #4?”
  • Unacceptable: “please email me your code for #4.”

Homework format

#' ## Question 1
#' Load the iris dataset by running
## ------------------------------------------------------------------------
data(iris)
#' And read about the dataset in the documentation:
## ------------------------------------------------------------------------
?iris
#' > How many observations (rows) are there for the versicolor species?
#' _______________________
#' ## Question 2
#' Create a vector with the following values:  23, 45, 12, 89, 1, 13, 28, 18. 
"' Then multiply each element of the vector by 15.  
#' > What is the standard deviation of the new vector? 

Questions?